Performance Issues of Heterogeneous Hadoop Clusters in Cloud Computing
نویسندگان
چکیده
Nowadays most of the cloud applications process large amount of data to provide the desired results. Data volumes to be processed by cloud applications are growing much faster than computing power. This growth demands new strategies for processing and analyzing information. Dealing with large data volumes requires two things: 1) Inexpensive, reliable storagee 2) New tools for analyzing unstructured and structured data. Hadoop is a powerful open source software platform that addresses both of these problems. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Hadoop lacks performance in heterogeneous clusters where the nodes have different computing capacity. In this paper we address the issues that affect the performance of hadoop in eterogeneous clusters and also provided some guidelines on how to overcome these
منابع مشابه
Diagnosing Heterogeneous Hadoop Clusters
We present a data-driven approach for diagnosing performance issues in heterogeneous Hadoop clusters. Hadoop is a popular and extremely successful framework for horizontally scalable distributed computing over large data sets based on the MapReduce framework. In its current implementation, Hadoop assumes a homogeneous cluster of compute nodes. This assumption manifests in Hadoop’s scheduling al...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملA Design of Heterogeneous Cloud Infrastructure for Big Data and Cloud Computing Services
Cloud computing and big data are two core services in many organizations. Combining a big data platform, such as Hadoop, into the cloud architecture using virtualization technique will result in losing the performance benefit of MapReduce. Unique for the existing virtualized big data cloud, this work introduces an innovative cloud architecture called the heterogeneous cloud. In the heterogeneou...
متن کاملParallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment
Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1207.0894 شماره
صفحات -
تاریخ انتشار 2011